Deeper CNNs Dimensionality

The Curse of Dimensionality & Building a Deeper CNN

Let's look at what happens to our classic SVM when we move away from tiny MNIST digits to harder, real-world color images.

The Problem with Flattening

To feed harder images (like 32x32 color images from CIFAR-10) into an SVM, we must flatten them into a 1D vector. Since color images have 3 RGB channels, the math becomes:

\( 3 \text{ channels} \times 32 \text{ height} \times 32 \text{ width} = \mathbf{3,072} \text{ features} \)

As the number of features grows, the math required for SVMs grows exponentially. The SVM becomes incredibly slow (CPU bound) and highly inaccurate because flattening destroys all spatial context of the color pixels.

Building a Deeper Brain: `CIFARCNN`

To handle these more complex, harder images, we need a deeper brain. We upgrade from our basic model to CIFARCNN. Here is how it scales up:

Upgrading the Architecture

More Filters: conv1 has 32 filters, conv2 has 64 filters.
Larger Fully Connected Layer: We increase the density to 512 neurons.
Why? More layers equate to the ability to learn more complex patterns (like the texture of dog fur versus the smooth, reflective metal of a car).

The Curse of Dimensionality & Building a Deeper CNN

The Problem with Flattening

Building a Deeper Brain: CIFARCNN

Building a Deeper Brain: `CIFARCNN`